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Abstract — We consider the network communication scenario, 
over directed acyclic networks with unit capacity edges in which 
a number of sources Si each holding independent unit-entropy 
information Xi wish to communicate the sum to a set of 

terminals tj. We show that in the case in which there are only two 
sources or only two terminals, communication is possible if and 
only if each source terminal pair Si/tj is connected by at least 
a single path. For the more general communication problem in 

■ which there are three sources and three terminals, we prove that a 
" single path connecting the source terminal pairs does not suffice 

to communicate ^Xi, We then present an efficient encoding 
scheme which enables the communication of ^ Xi for the three 
sources, three terminals case, given that each source terminal pair 
is connected by two edge disjoint paths. Our encoding scheme 
includes a structural decomposition of the network at hand which 
may be found useful for other network coding problems as well. 

Index Terms — network coding, function computation, multi- 
cast, distributed source coding. 

I. Introduction 

The problem of function computation over networks is 
. perhaps the most general information transmission problem 
one can formulate. Many problems studied in information 
theory can be cast as instances of it by defining the function 
appropriately. In the most general case, one could consider 
. arbitrarily correlated sources over a noisy network with multi- 
" pie terminals requesting arbitrary functions of the sources with 
respect to specified distortion constraints. It is evident that the 
problem in its full generality is quite challenging. 

In this work we consider a problem setting in which the 

■ sources are independent and the network links are error- 
free, but capacity constrained. However, the topology of the 
network can be quite complicated, such as an arbitrary directed 
acyclic graph. This serves as an abstraction of current-day 
computer networks at the higher layers. We investigate the 
problem of characterizing the network resources required to 
communicate the sum (over a finite field) of a certain number 
of sources over a network to multiple terminals. By network 
resources, we mean the number of edge disjoint paths between 
various source terminal pairs in the network. Our work can be 
considered as using network coding to compute and multicast 
sums (or more generally functions) of the messages, as against 
multicasting the messages themselves. 
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The problem of multicast has been studied intensively 
under the paradigm of network coding. The seminal work of 
Ahlswede et al. Q showed that under network coding the 
multicast capacity is the minimum of the maximum flows 
from the source to each individual terminal node. The work 
of Li et al. Q showed that linear network codes are sufficient 
to achieve the multicast capacity. The algebraic approach to 
network coding proposed by Koetter and Medard tSJ provided 
simpler proofs of these results. 

The problem of multicasting sums of sources is an important 
component in enabling the multicast of correlated sources over 
a network (using network coding). Network coding for corre- 
lated sources was first examined by Ho et al. (H. The work 
of Ramamoorthy et al. |5| showed that in general separating 
distributed source coding and network coding is suboptimal 
except in the case of two sources and two terminals. The work 
of Wu et al. |6| presented a practical approach to multicasting 
correlated sources over a network. Reference O also stated the 
problem of communicating sums over networks using network 
coding, and called it the Network Arithmetic problem. We 
elaborate on related work in the upcoming Section [III 

In this work, we present (sometimes tight) upper and lower 
bounds on the network resources required for communicating 
the sum of sources over a network under certain special cases. 

A. Main Contributions 

We consider networks that can be modeled as directed 
acyclic graphs, with unit capacity edges. Let G = (V^E) 
represent such a graph. There is a set of source nodes S CV 
that observe independent unit-entropy sources, Xi^i e S, and 
a set of terminal nodes T C V, that seeks to obtain Xl^^^ -^i, 
where the sum is over a finite field. Our work makes the 
following contributions. 

i) Characterization of necessary and sufficient conditions 
when either \S\ =2 or \T\ = 2. 

Suppose that G is such that there are either two sources 
(l^*! = 2) and an arbitrary number of terminals or an 
arbitrary number of sources and two terminals (|T| = 2). 
The following conditions are necessary and sufficient for 
recovery of terminals in T. 

max-flow(5i — tj) > 1 for all Si e S and tj G T. 

Our proofs are constructive, i.e., we provide efficient 
algorithms for the network code assignment. 

ii) Unit connectivity does not suffice when \S\ and \T\ are 
both greater than 2. 

We present a network G such that l^*! = |T| = 3 in 
which the maximum flow between each source terminal 
pair is at least 1 and (as opposed to that stated above) 
communicating the sum of sources is not possible. 



iii) Sufficient conditions when \S\ = \T\ = 3. 

Suppose that G is such that l^*! = |T| = 3. The following 
condition is sufficient for recovery of ^ 
T. 

max-flow(si — tj) > 2 for all Si e S and G T. 

Efficient algorithms for network code assignment are 
presented in this case as well. 

iv) Development of a technique for structural decomposition 
of networks. 

We propose a labeling scheme for nodes and edges of the 
graph, that allows us to arrive at a class decomposition 
of the relevant networks. This significantly improves our 
ability to reason about the problem at hand. We believe 
that this technique may be of independent interest. 
Finally, we emphasize that while we work with the sum 
of sources function throughout our discussions, most of our 
results will carry over for functions that are invariant under 
permutations of the sources. 

This paper is organized as follows. We discuss background 
and related work in Section [III and our network coding 
model in Section [nil The characterization for the case of 
= 2, |T| = n and = n, |T| = 2 is discussed in Sections 
HVl and |Vl respectively. Our counter-example demonstrating 
that unit-connectivity does not suffice for three sources and 
three terminals can be found in Section [Vll Sections IVIII 
and IVIIII discuss the sufficient characterization in the case of 
three sources and three terminals, and Section |IX| presents the 
conclusions and possibilities for future work. 

II. Background and Related Work 

Prior work of an information theoretic flavor in the area 
of function computation has mainly considered the case of 
two correlated sources X and Y, with direct links between 
the sources and the terminal, where the terminal is interested 
in reconstructing a function /(X, F). In these works, the 
topology of the network is very simple, however the structure 
of the correlation between X and Y may be arbitrary. In 
this setting, Korner & Marton (T) determine the rate region 
for encoding the modulo-2 sum of X and Y when they are 
uniform, correlated binary sources. The work of Orlitsky & 
Roche 18] determines the required rate for sending X to a 
decoder with side information Y that must reliably compute 
/(X, Y). The result of m was extended to the case when both 
X and Y need to be encoded (under certain conditions) in ||9l . 
Yamamoto |[TOl (generalizing the Wyner-Ziv result |fTH ) found 
the rate-distortion function for sending X to a decoder with 
side information F, that wants to compute /(X, F) within a 
certain distortion level (see also [12] for an extension). Nazer 
et al. |[T3l consider the problem of reliably reconstructing a 
function over a multiple-access channel (MAC) and finding 
the capacity of finite-field multiple access networks. 

In contrast with previous studies, in this work we consider 
a problem setting in which the sources are independent and 
the network links are error-free, but capacity constrained. 
However, the topology of the network can be quite compli- 
cated, such as an arbitrary directed acyclic graph. This is 



well motivated since it is a good abstraction of current-day 
computer networks (at the higher layers). We investigate the 
problem of characterizing the network resources required to 
communicate the sum of a certain number of sources over 
a network to multiple terminals. Network resources can be 
measured in various ways. For example, one may specify 
the maximum flow between the subsets of the source nodes 
and subsets of the terminal nodes in the network. In the 
current work, all of our characterizations are in terms of the 
maximum flow between various Si — tj pairs, where Si (tj) 
denotes a source (terminal) node. Previous work in this area, 
includes the work of Ahlswede et al. [[TJ, who introduced the 
concept of network coding and showed the capacity region 
for multicast. In multicast, the terminals are interested in 
reconstructing the actual sources. Numerous follow-up works 
have extended and improved the results of [1], in different 
ways. For example, [[3, [[51 considered multicast with linear 
codes. Ho et al. lfT4ll proposed random network coding and 
examined the multicast of correlated sources over a network 
and showed a tight capacity region for it that can be achieved 
by using random network codes. Follow-up works [[3, [[SJ 
investigated practical approaches for achieving this goal. As 
shown in [[6l, the problem of communicating (multicasting) 
the sum (over a finite field) of sources over a network is a 
subproblem that can help facilitate practical approaches to this 
problem. 

In this work we consider function computation under net- 
work coding. Specifically, we present network code assign- 
ment algorithms for the problem of multicasting the sum of 
sources over a network. As one would expect, one needs 
fewer resources in order to support this. To the best of our 
knowledge, the first work to examine function computation in 
this setting is the work of Ramamoorthy ifTSl . that considered 
the problem of multicasting sums of sources, when there are 
either two sources or two terminals in the network. Subse- 
quently, the work of Langberg and Ramamoorthy [161 showed 
that the characterization of ifTSl does not hold in the case of 
three sources and three terminals. Reference |[T6l . proposed an 
alternate characterization in this case. The current paper is a 
revised and extended version of ifTSl and |[T6l that contains all 
the proofs and additional observations. 

Rai and Dey HVfi independently found the same counter- 
example found in our work |[T6l : however, their proof only 
shows that linear codes do not suffice for multicasting sums 
under the characterization of ifTSl . Their work also contains 
an alternate proof of the result in the case of n sources 
and two terminals (see Section [V]). The work of Appuswamy 
et al. ifTSl . |[T9l also considers function computation in the 
setting of error- free directed acyclic networks. In ifTSll . |[T9l , 
the emphasis is on considering the rate of the computation, 
where the rate refers to the maximum number of times a 
function can be computed per network usage. However, all 
their results are in the context of only one terminal, and for 
the most part do not provide constructive solutions. 

III. Network coding model 

In our model, we represent the network as a directed graph 
G = {V^E). The network contains a set of source nodes 
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S C V that are observing independent, discrete unit-entropy 
sources and a set of terminals T c V. We assume that each 
edge in the network has unit capacity and can transmit one 
symbol from a finite field of size 2^ per unit time (we are 
free to choose m large enough). If a given edge has a higher 
capacity, it can be treated as multiple unit capacity edges. 
A directed edge e between nodes Vi and vj is represented 
as {vi Vj). Thus head{e) = Vj and tail{e) = Vi. A 
path between two nodes Vi and Vj is a sequence of edges 
{ei, 62, . . . , e^} such that tail{ei) = Vi^head{ek) = Vj and 
head{ei) = tail{ei-^i)^ z = 1, . . . , /c — 1. 

Our counter-example in Section |Vl] considers arbitrary net- 
work codes. However, our constructive algorithms in Sections 
HVl Ivl and IVIII shall use linear network codes. In linear 
network coding, the signal on an edge {vi Vj), is a linear 
combination of the signals on the incoming edges on Vi and the 
source signal at Vi (if Vi e S). In this paper we assume that the 
source (terminal) nodes do not have any incoming (outgoing) 
edges from (to) other nodes. If this is not the case one can 
always introduce an artificial source (terminal) connected to 
the original source (terminal) node by an edge of sufficiently 
large capacity that has no incoming (outgoing) edges. We shall 
only be concerned with networks that are directed acyclic in 
which internal nodes have sufficient memory. Such networks 
can be treated as delay-free networks. Let Y^^ (such that 
tail{ei) = Vk and head{ei) = vi) denote the signal on the 
i^^ edge in E and let Xj denote the j^^ source. Then, we 
have 

Ye^ = Yl f^.iYe, if Vk e V\S, and 

{ej\head{ej)^Vk} 

Ye, = ^ if Vk e S, 

{j\Xj observed at Vk} 

where the coefficients aj^i and fj^i are from GF{2'^). Note 
that since the graph is directed acyclic, it is possible to express 
Ye- for an edge in terms of the sources Xj's. Suppose that 
there are n sources Xi, . . . , X^. If Ye- = Yl^^i Pa^k^k then 
we say that the global coding vector of edge is (3^. = 
[Pa, I ••• Pei.n]' For brevity we shall mostly use the term 
coding vector instead of global coding vector in this paper. 
We say that a node Vi (or edge e^) is downstream of another 
node Vj (or edge Cj) if there exists a path from Vj (or Cj) to 
Vi (or Ci). 

IV. Case of two sources and n terminals 

In this section we state and prove the rate region for the 
network arithmetic problem when there are two sources and 
n terminals. Before embarking on this proof, we overview the 
concept of greedy encoding that will be used throughout the 
paper. In what follows, we assume that our network G has unit 
capacity edges, and our source nodes si generate information 
Xi of unit entropy. 

Definition 1: Greedy encoding. Consider a graph G = 
(V^E), with two source nodes si and 52 and an edge = 
{u ^ v) e E. Suppose that the coding vector on each edge 
e entering u, has only or 1 entries, i.e., /3g = [/3e,i /3e,2], 



where /3e,i G {0, 1}, for alH = 1,2. We say that the encoding 
on edge is greedy, if for i = 1 , 2 we have 

^ _ |0 if Pe,i = 0, Ve entering u 
^ 1 1 otherwise. 

A coding vector assignment for G, is said to be greedy if the 

encoding on each edge in G is greedy. 

A useful property of greedy encoding is given below. 

Lemma 1: Suppose that we perform greedy encoding on a 
graph G = {V.,E) with two source nodes si and S2 holding 
information Xi and X2 respectively. Consider a vertex u that 
is downstream of a subset of source nodes, indexed by the set 
B C {1,2}, i.e., u is downstream of all Si such that i e B. 
Then for any edge e leaving u it holds that /3e,i = 1, G B. 
Moreover, if is a terminal node then u can recover the sum 

Proof. Follows directly from Definition [T] ■ 
Remark 1: Greedy encoding can be performed in the case 
of two sources, since if a node only receives either Xi or 
X2, it just forwards them. Alternatively, if it receives both of 
them or Xi + X2, then it just forwards Xi + X2. However, 
this form of greedy encoding does not seem to have a natural 
generalization that is useful in our problem setting when the 
number of sources is higher. For example, if there are three 
sources, and a node receives the combinations Xi + X2 and 
X2 + X3, it cannot compute Xi + X2 + X3. 
The main result of this section is the following. 

Theorem 1: Consider a directed acylic graph G = (V^E) 
with unit capacity edges, two source nodes si and 52 and n 
terminal nodes ti , . . . , such that 

max-flow(si — tj) > 1 for alH = 1, 2 and j = 1, . . . , n. 

Assume that at each source node Si, there is a unit-rate source 
Xi, and that the X^'s are independent. Then, there exists an 
assignment of coding vectors to all edges such that each tj , j = 
1 , . . . , n can recover Xi + X2 . 

The basic idea of the proof in the case of two sources and 
n terminals is greedy encoding. 

Proof of Theorem [T] Consider any terminal node tj . As we 
assume that max-flow(si — tj) > 1 for alH = 1,2, it holds 
that tj is downstream of both si and 52. Thus by Lemma [H 
tj can recover Xi + X2 . ■ 
Note that if any of the conditions in the statement of 
Theorem [T] are violated then some terminal will be unable 
to compute Xi + X2. For example, if max-flow(5i —tj)<l 
then any decoded signal Y at tj will have i^(F|X2) < 1 (as 
Y is solely a function of Xi and X2). We conclude that Y 
cannot be Xi + X2 . 

V. Case of n sources and two terminals 

We now present the rate region for the situation when there 
are n sources and two terminals such that each terminal wants 
to recover the sum of the sources. 

To show the main result we first demonstrate that the 
original network can be transformed into another network 
where there exists exactly one path from each source to each 
terminal. By a simple argument it then follows that coding 
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vectors can be assigned so that the terminals recover the sum 
of the sources. 

Theorem 2: Consider a directed acyHc graph G = (V, E) 
with unit capacity edges, n source nodes 5i, S2, . . . , 5n and 
two terminal nodes ti and t2 such that 

max-flow(5i —tj) > 1 for alH = 1, . . . , n and j = 1, 2. 

Assume that the source nodes observe independent unit- 
entropy sources Xi^i = l,...,n. Then, there exists an 
assignment of coding vectors such that each terminal can 
recover the sum of the sources Yl^^i ^i- 

To simplify our proof, we modify the graph G of Theorem [2] 
by introducing virtual source nodes s[^i = l,...n, virtual 
terminals tj^j = 1,2 and virtual unit-capacity edges 5 • 
Si, i = 1, . . . , n and tj ^ , j = 1, 2. These additions do not 
change the connectivity constraints specified in Theorem [2] and 
can be done w.l.o.g. Let the new set of sources be denoted S = 
{s'l, . . . , s^}, and the modified graph G^ Notice that in G' it 
holds that max-flow(s- —tj) = l for alH = 1, . . . , n and j = 
1,2. We also need the following definitions. 

Definition 2: Exactly one path condition. Consider two 
nodes vi and V2 such there is a path V between vi and V2. 
We say that there exists exactly one path between vi and V2 if 
there does not exist another path V' between vi and V2 such 
that V' V. 

Definition 3: Minimality. A graph G = (V^E) with n 
source nodes si, S2, . . . , and two terminal nodes ti and 
t2 is said to be minimal with respect to the connectivity 
requirements of Theorem [2] if the removal of any edge from 
E violates one of the requirements (i.e. for some i and j it 
holds that max-flow(s • - tj) < 1). 

To show that Theorem [2] holds we first need an auxiliary 
lemma that we state below. The proof can be found in the 
Appendix. 

Lemma 2: Consider the graph G^ as constructed above with 
sources . . . ^s'^ and terminals t[ and 1 2 • There exists a 
subgraph G* of G' such that G* is minimal and there exists 
exactly one path from s • to for z = 1 , . . . , n and j = 1,2 
in G*. 

A. Proof of Theorem |2] 

From Lemma [2l we know that it is possible to find a 
subgraph G* of G such that there exists exactly one path from 
s[ to for alH = 1, . . . , n and j = 1, 2. Suppose that we find 
G*. We will show that each terminal can recover Yl^^iXi 
by assigning appropriate local encoding responsibilities for 
every node. Consider a node v G G* and let r^{v) and r*(v) 
represent the set of outgoing edges from v and incoming edges 
into V respectively. Let Ye represent the symbol transmitted on 
edge e. Each node operates in the following manner. 

Fe = ^ Ye' for e G r^(^), (2) 

i.e., each node forwards the sum of the inputs on all output 
edges. In this case we observe that a terminal tj can recover 
XliLi ^hi^' ^^^^ ^^^^ received value at tj can be 

expressed as the sum of the received values over all possible 



paths from sources si^ ... ^ Sn to tj. By construction, there is 
exactly one path from Si to tj. Thus, tj receives Yl7^i ^i- ■ 
As in the previous section it is clear that if any of the 
conditions in the statement of Theorem [2] are violated then 
either terminal ti or t2 will be unable to find Yl7=i ^i- 

VL Example of three sources and three 

TERMINALS WITH INDEPENDENT UNIT-ENTROPY SOURCES 

We now present our counter example which shows that one 
cannot generalize the characterization presented in Theorems [T] 
and [2] to the case of more sources or terminals. Namely, we 
present a network with three sources and three terminals, with 
at least one path connecting each source terminal pair, in 
which the sum of sources cannot (under any network code) 
be transmitted (with zero error) to all three terminals. 

Consider the network shown in Figure [T] with three source 
nodes and three terminal nodes such that the source nodes 
observe unit entropy sources Xi,X2 and X3 that are also 
independent. All edges are unit capacity. As showed in Figure 
[T] the incoming edges into terminal ts contain the values 
/(Xi, X2) and /'(X2, X3) where / and are some functions 
of the sources. 

Suppose that X3 = 0. This implies that ti should be 
able to recover Xi + X2 (that has entropy 1) from just 
/(Xi,X2). Moreover note that each edge is unit capacity. 
Therefore, the entropy of /(Xi,X2) also has to be 1, i.e., 
there exists a one-to-one mapping between the set of values 
that /(Xi,X2) takes and the values of Xi + X2. In a 
similar manner we can conclude that there exists a one-to- 
one mapping between the set of values that /'(X2,X3) takes 
and the values of X2 + X3. At terminal ts, there needs to 
exist some function /i(/(Xi, X2), /'(X2, X3)) = 
By the previous observations, this also implies the existence 
of a function h\Xi + X2,X2 + X3) that equals Yl^^i^i- 
We now demonstrate that this is a contradiction. Consider 
the following sets of inputs: Xi = a,X2 = 0,X3 = c and 
X( = a — 6, X2 = 6, X3 = c — 6. In both cases the inputs to 
the function h'{-^-) are the same. However Y^i^i Xi = a-\-c, 
while Y^^=i ^'i = a — 6 + c, that are in general different. 
Therefore such a function h'{'^-) cannot exist. 

Note that we have presented the proof in the context of 
scalar nonlinear network codes. However, even if we consider 
vector sources along with vector network codes, the same idea 
of the proof can be used. 

VII. Case of three sources and three terminals 

It is evident from the counter-example discussed in Section 
that the characterization of the rate region in the case of 
three sources and three terminals is different from the cases 
discussed in Section |IV] and |Vl In this section, we show that as 
long as each source is connected by two edge disjoint paths to 
each terminal, the terminals can recover the sum. We present 
efficient linear encoding schemes that allow communication in 
this case. The main result of this section can be summarized 
as follows. 

Theorem 3: Let G = (V, E) be a directed acyclic network 
with three sources 51,^2,53 and three terminals ^1,^2,^3. Let 
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Fig. 1. Example of a network with three sources and three terminals, such 
that there exists at least one path between each source and each terminal. 
However all the terminals cannot compute 



Xi be the information present at source Si. If there exist two 
edge disjoint paths between each source/terminal pair, then 
there exists a network coding scheme in which the sum Xi + 
X2 + X3 is obtained at each terminal tj. Moreover, such a 
network code can be found efficiently. 

It turns out that the proof of this result requires several new 
techniques that may be of independent interest. 

Remark 2: Our example in Section [Vll shows that a single 
path between each Si — tj pair does not suffice. At the other 
extreme, if there are three edge-disjoint paths between each 
Si — tj pair, then one can actually multicast Xi , X2 and X3 
to each terminal ||3l. Our results show that two edge disjoint 
paths between each source terminal pair are sufficient for 
multicasting sums. 

We start by giving an overview of our proof. Roughly 
speaking, our approach for determining the desired network 
code has three steps. In the first step, we turn our graph 
G into a graph G = (V^E) in which each internal node 
V G F is of total degree (in-degree + out-degree) at most three. 
We refer to such graphs as structured graphs. Our efficient 
reduction follows that appearing in ll20l , and has the following 
properties: (a) G is acyclic, (b) For every source (terminal) in 
G there is a corresponding source (terminal) in G. (c) For 
any two edge disjoint paths Pi and P2 connecting a source 
terminal pair in G, there exist two vertex disjoint paths in G 
connecting the corresponding source terminal pair. Here and 
throughout we say two paths between a source terminal pair 
are vertex disjoint even though they share their first and last 
vertices (i.e., the source and terminal at hand), (d) Any feasible 
network coding solution in G can be efficiently turned into a 
feasible network coding solution in G. 

It is not hard to verify that proving Theorem [3] on structured 
graphs implies a proof for general graphs G as well. Indeed, 
given a network G satisfying the requirements of Theorem [3] 
construct the corresponding network G. By the properties 
above, G also satisfies the requirements of Theorem O Assum- 
ing Theorem [3] is proven for structured graphs G, we conclude 



the existence of a feasible network code in G. Finally, this 
network code can be converted (by property (d) above) into 
a feasible network code for G as desired. We specify the 
mapping between G and G and give proof of properties (a)- 
(d) in Section IVII-AI For notational reasons, from this point 
on in the discussion we will assume that our input graph G is 
structured — which is now clear to be w.l.o.g. 

In the second step of our proof, we give edges and vertices 
in the graph G certain labels depending on the combinatorial 
structure of G. This step can be viewed as a decomposition of 
the graph G (both the vertex set and the edge set) into certain 
class sets which may be of interest beyond the context of this 
work. These classes will later play a major role in our analysis. 
The decomposition of G is given in detail in Section IVII-BI 

Finally, in the third and final step of our proof, using the 
labeling above we perform a case analysis for the proof of 
Theorem [3] Namely, based on the terminology set in Sec- 
tion [VinBl we identify several scenarios, and prove Theorem[3] 
assuming they hold. As the different scenarios we consider 
will cover all possible ones, we will conclude our proof. 
Our detailed case analysis is given in Section IVII-CI and 
Section [Villi 

All in all, as will be evident from the sections yet to come, 
our proof is constructive, and each of its steps can be done 
efficiently. This will result in the efficient construction of the 
desired network code for G. We now proceed to formalize the 
steps of our proof. 

A. The reduction 

Let G = {V^E) be our input network, and let Si and ti be 
the given sources and terminals. We now efficiently construct 
a structured graph G = {V ^E) in which each internal node 
V G F is of total degree at most three with the additional 
following properties: (a) G is acyclic, (b) For every source 
(terminal) in G there is a corresponding source (terminal) in 
G. (c) For any two edge disjoint paths Pi and P2 connecting a 
source terminal pair in G, there exist two vertex disjoint paths 
in G connecting the corresponding source terminal pair, (d) 
Any feasible network coding solution in G can be efficiently 
turned into a feasible network coding solution in G. Our 
reduction follows that appearing in ll20l and is given here for 
completeness. 

The reduction is done iteratively according to the following 
procedure in which we reduce the total degree of internal 
vertices to be at most 3. First we note that any source 
(terminal) in G is also one in G. 

1 ) Reducing degrees: Let G be the graph formed from G by 
iteratively replacing each node v e G, which is not a source or 
a terminal node whose degree is more than three by a subgraph 
F^, constructed as follows. Let {{xi^v) \ i = 1, . . . ^din{v)} 
and {{v^i/i) I i = 1, . . . , douti"^)} be the incoming and 
outgoing links of v, respectively, where din{v) and dout{v) are 
the in- and out- degrees of v. For each incoming link (xi^v) of 
V, we add to a node Xi and a binary tree with root at Xi and 
douti"^) leaves x^, . . . , f Similarly, for each outgoing 

link (v^i/i) of V, we add to F^ a node yi and an inverted binary 
tree with root at and diniv) leaves yj^ . . . ^yf'''^^\ Next, 
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for each I < i < din{v) and 1 < j < dout{v) we add an edge 
{xj , yj) to r^. Finally, we connect Ty to the rest of the network 
by adding edges {xi.Xi) for 1 < i < din{v) and {yi.yi) for 
I < i < dout{v)- Figures [2] and [3] demonstrate the construction 
of the subgraph for a node v with din{v) = dout{v) — 3. 
Note that for any two links (x^, v) and (v, y^) there is a path 
in F^ that connects Xi and y^. 



Xi X2 Xs 




yi y2 vs 



Fig. 2. A node v e G. 




Fig. 3. The gadget for in Figured 

We proceed to analyze the properties of G, namely we show 
that G is structured. The proof of properties (a), (b) and (c) 
follow directly by our construction. For property (d) consider a 
feasible network code for the network G. A feasible network 
code for G is constructed as follows. Let e = (u^v) be an 
edge in G. Let be the corresponding edge between F^ and 
F^ in G. Here we assume both u and v were replaced by 
corresponding gadgets. Other cases can be proven analogously. 
The encoding function fe for e = (u^v) is determined by the 
encoding functions /g of links e that belong to F^ . Specifically, 
let A = {(xi,xi ),..., be the incoming 

links of F^ where din{u) is the in-degree of u in G. The 
construction of G implies that the information transmitted on 
the link e' is a function fe' of the packets transmitted on links 
A. We use this exact function as the desired encoding function 
/e. The fact that the incoming links of in G correspond to 
the links in A implies the feasibility of the resulting code for 
G. 



B. The decomposition 

In this section we present our structural decomposition of 
G = {V^E). We assume throughout that G is directed and 
acyclic, that it has three sources si,52,S3, three terminals 
^1, ^2, and that any internal vertex in V (namely, any vertex 
which is neither a source or a sink) has total degree at most 3. 
Moreover, we assume G satisfies the connectivity requirements 
specified in Theorem [S] 

We start by labeling the vertices of G. A vertex v e V 
is labeled by a pair (cs,q) specifying how many sources 
(terminals) it is connected to. Specifically, Cs{v) equals the 
number of sources Si for which there exists a path connecting 
Si and V in G. Similarly, Ct{v) equals the number of terminals 
tj for which there exists a path connecting v and tj in G. 
For example, any source is labeled by the pair (1,3), and any 
terminal by the pair (3, 1). An internal vertex v labeled (-,1) 
is connected to a single terminal only. This implies that any 
information leaving v will reach at most a single terminal. 
Such vertices v play an important role in the definitions to 
come. This concludes the labeling of V. 

An edge e = {u^v) for which v is labeled (-,1) will 
be referred to as a terminal edge. Namely, any information 
flowing on e can reach at most a single terminal. If this 
terminal is tj then we will say that e is a tj-edge. Clearly, 
the set of ti -edges is disjoint from the set of 1 2 -edges (and 
similarly for any pair of terminals). An edge which is not a 
terminal edge will be referred to as a remaining edge or an 
r-edge for short. 

We now prove some structural properties of the edge sets 
we have defined. First of all, there exists an ordering of edges 
in E in which any r-edge comes before any terminal edge, 
and in addition there is no path from a terminal edge to an 
r-edge. This is obtained by an appropriate topological order 
in G. Moreover, for any terminal tj, the set of -edges form a 
connected subgraph of G rooted at tj . To see this note that by 
definition each tj-edge e is connected to tj and all the edges 
on a path between e and tj are -edges. Finally, the head 
of an r-edge is either of type (-,2) or (-,3) (as otherwise it 
would be a terminal edge). 

For each terminal tj we now define a set of vertices referred 
to as the leaf set Lj of tj. This definition shall play an 
important role in our discussions. 

Definition 4: Leaf set of a terminal. Let P = {si = 

, ^2 , • • • , = tj) be a path from Si to tj . Consider the 
intersection of P with the set of tj -edges. This intersection 
consists of a subpath P' , {vp, . . . ^V£ = tj) of P for which 
the label of t;^ is either (-,2) or (-,3), and the label of any 
other vertex in P' is (•,!). We refer to as the leaf of tj 
corresponding to path P, and the set of all leaves of tj as the 
leaf set Lj. 

We remark that (a) the leaf set of tj is the set of nodes of 
in-degree in the subgraph consisting of tj -edges and (b) a 
source node can be a leaf node for a given terminal. 

C. Case analysis 

We now present a classification of networks based on the 
node labeling procedure presented above. For each class of 
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networks we shall argue that each terminal can compute the 
sum of the sources (Xi + X2 + X3). Our proof shall be 
constructive, i.e., it can be interpreted as an algorithm for 
finding the network code that allows each terminal to recover 

1) Case 0: There exists a node of type (3,3) in G. 
Suppose node v is of type (3,3). This implies that there 
exist path{si — v), for i = 1, . . . , 3 and path{v — tj), for 
j = 1, . . . , 3. Consider the subgraph induced by these paths 
and color each edge on yj\^^path{si — v) red and each edge 
on U^^ipath{v — tj) blue. We claim that as G is acyclic, at 
the end of this procedure each edge gets only one color. To see 
this suppose that a red edge is also colored blue. This implies 
that it lies on a path from a source to v and a path from v 
to a terminal, i.e. its existence implies a directed cycle in the 
graph. 

Now, we can find an inverted tree that is a subset of the 
red edges directed into v and similarly a tree rooted at v with 
^1,^2 and ^3 as leaves using the blue edges. Finally, we can 
compute (Xi + X2 + X3) dXv over the red tree and multicast 
it to ti , t2 and ^3 over the blue subgraph. More specifically, 
one may use an encoding scheme in which internal nodes of 
the red tree receiving Yi and Y2 send on their outgoing edge 
the sum Yi + I2. 

2) Case 1: There exists a node of type (2,3) in G. Note 
that it is sufficient to consider the case when there does not 
exist a node of type (3,3) in G. We shall show that this case 
is equivalent to a two sources, three terminals problem. 

W.l.o.g. we suppose that there exists a (2,3) node v that 
is connected to S2 and S3. We color the edges on path{s2 — 
v) and path{ss — v) blue. Next, consider the set of paths 
^^=iPCith{si — ti). We claim that these paths do not have 
any intersection with the blue subgraph. This is because the 
existence of such an intersection would imply that there exists 
a path between si and v which in turn implies that v would 
be a (3,3) node. We can now compute (X2 + X3) at v by 
finding a tree consisting of blue edges that are directed into v. 
Suppose that the blue edges are removed from G to obtain a 
graph G^ Since G is directed acyclic, we have that there still 
exists a path from v to each terminal after the removal. Now, 
note that (a) G^ is a graph such that there exists at least one 
path from si to each terminal and at least one path from v to 
each terminal, and (b) v can be considered as a source that 
contains (X2 + X3). Now, G^ satisfies the condition given in 
Theorem [T] (which addresses the two sources version of the 
problem at hand), therefore we are done. 

3) Case 2: There exists a node of type (3,2) in G. As 
before it suffices to consider the case when there do not exist 
any (3,3) or (2,3) nodes in the graph. Suppose that there 
exists a (3,2) node v and w.l.o.g. assume that it is connected 
to ti and t2. We consider the subgraph G' induced by the 
union of the following sets of paths 

1) Ui^ipath{si - v), 

2) ^1^ipath{v — ti), and 

3) Ul^^path{si - ts). 

Note that as argued previously, a subset of edges of 
Ui^ipath{si—v) can be found so that they form a tree directed 
into V. For the purposes of this proof, we will assume that this 



has already been done, i.e., the graph U^^ipath{si — v) is a 
tree directed into v. 

The basic idea of the proof is to show that the paths from 
the sources to terminal ts, i.e., U^^ipath{si — ts) are such 
that their overlap with the other paths is very limited. Thus, 
the entire graph can be decomposed into two parts, one over 
which the sum is transmitted to ti and t2 and another over 
which the sum is transmitted to ts . Towards this end, we have 
the following two claims. 

Claim 1: The path, path{si — ts) cannot have an intersec- 
tion with either path{s2 — v) or path{ss — v). 

Proof: Suppose that such an intersection occurred at a 
node v'. Then, it is easy to see that v' is connected to at least 
two sources and to all three terminals and therefore is a node 
of type (2,3), which contradicts our assumption. ■ 

In an analogous manner we can see that (a) path{s2 — ts) 
cannot have an intersection with either path{si — v) or 
path{ss — v), and (b) path{ss — ts) cannot have an intersection 
with either path{si — v) or path{s2 — v). 

Claim 2: The paths, path{si — ts)^path{s2 — ts) and 
path{ss—ts) cannot have an intersection with cither path {v — 
ti) or path{v — ^2). 

Proof: To see this we note that if such an intersection 
happened, then v would also be connected to ts which would 
imply that v is a (3,3) node. This is a contradiction. ■ 

Let Vi be the node closest to v that belongs to both path{si — 
v) and path{si — ts) (notice that Vi may equal Si but it cannot 
equal v). Consider the following coding solution on G^ On 
the paths path{si — Vi) send Xi. On the paths path{vi — v) 
send information that will allow v to obtain Xi + X2 + X3. 
This can be easily done, as these (latter) paths form a tree 
into V. Namely, one may use an encoding scheme in which 
internal nodes receiving Yi and Y2 send on their outgoing edge 
the sum Yi + I2. By the claims above (and the fact that G' 
is acyclic) it holds that the information flowing on edges e 
in path{vi — ts)^i = 1,...,3 has not been specified by the 
encoding defined above. Thus, one may send information on 
the paths path{vi — ts) that will allow ts to obtain Xi +X2 + 
X3. Here we assume the paths path{vi — ts) form a tree into 
ts, if this is not the case we may find a subset of edges in these 
paths with this property. Once more, by the claims above (and 
the fact that G^ is acyclic) it holds that the information flowing 
on edges e in the paths path{v — ti) and path{v — ^2) has not 
been specified (by the encodings above). On these edges we 
may transmit the sum Xi + X2 + X3 present at v. 

4) Case 3: There do not exist (3, 3), (2, 3) and (3, 2) nodes 
in G. Note that thus far we have not utilized the fact that 
there exist two edge-disjoint paths from each source to each 
terminal in G. In previous cases, the problem structure that has 
emerged due to the node labeling, allowed us to communicate 
(Xi + X2 + X3) by using just one path between each Si — tj 
pair. However, for the case at hand we will indeed need to use 
the fact that there exist two paths between each Si — tj pair. 
As we will see, this significantly complicates the analysis. We 
present the analysis of Case 3 in the upcoming section. 
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VIII. Analysis of Case 3 

Note that the node labeHng procedure presented above 
assigns a label {cs{v)^ Ct{v)) to a node v where Cs{v) (ct{v)) is 
the number of sources (terminals) that v is connected to. This 
labeling ignores the actual identity of the sources and terminals 
that have connections to v. It turns out that we need to use an 
additional, somewhat finer notion of node connectivity when 
we want to analyze case 3. We emphasize that throughout 
this section, we still operate under the assumption that the 
reduction in Section IVII-AI has been performed and that each 
node has a total degree at most three. 

Towards this end, for case 3 (i.e., in a graph G without 
(3, 3), (2, 3) and (3, 2) nodes) we introduce the notion of the 
color of a node. For each (2, 2) node in G, the color of 
the node is defined as the 4-tuple of sources and terminals 
it is connected to, e.g., if v is connected to sources si and 
52 and terminals ti and ^2, then its color, denoted col(v) is 
(51,82,^1,^2). We shall also say that the source color of v 
is (si,52) and the terminal color of v is (^1,^2). The source 
and terminal colors are sometimes referred to as source and 
terminal labels. 

The following claim is immediate. 

Claim 3: If there is a (2,2) node v in G of color col(v), 
then each terminal in the terminal color of v has at least one 
leaf with color col(v). For example, if col(v) = (51,82,^1,^2), 
then both ti and t2 have leaves with color (51,52,^1,^2). 

Proof: W.l.o.g, let col(v) = (51,52,^1,^2). This implies 
that there exists a path P between v and ti. Let ^ be a leaf of 
ti on P. Recall that £ is defined as the last node on P with 
terminal label at least 2, namely ct{i) > 2. Moreover, ct{£) 
is exactly 2 and no larger as otherwise Ct{v) would also be 
greater than 2 contradicting our assumptions in the claim. This 
implies that the terminal color of £ is exactly (^1,^2). As £ is 
downstream of v it holds that Cs{£) > Cs{v) = 2. Here also, 
it holds that Cs{£) is exactly 2, otherwise £ would be a (3, 2) 
node (contradicting our assumption for case 3). This implies 
that the source color of £ is (51, 52). Therefore, ti has a leaf 
of color (51, 52, ti, ^2). A similar argument holds for ^2- ■ 

The notion of a color is useful for the set of graphs under 
case 3, since we can show that there can never be an edge 
between nodes of different colors. We exploit this property 
extensively below. 

Lemma 3: Consider a graph G, with sources, 5^,z = 
1, . . . , 3, and terminals tj^j = 1, ... 3, such that it does not 
have any (3, 3), (2, 3) or (3, 2) nodes. There does not exist an 
edge between (2,2) nodes of different color in G. 

Proof: Assume otherwise and consider two (2, 2) nodes 
vi and V2 such that col(vi) 7^ col(v2), for which there is an 
edge (^1,^2) in G. Note that if the source colors of col(vi) 
and col(v2) are different, then V2 has to be a (3,2) node, 
which is a contradiction. Likewise, if the terminal colors of 
col(fi) and col(v2) are different, then vi has to be a (2,3) 
node, which is also a contradiction. ■ 

Lemma [3] implies that we are free to assign any coding coef- 
ficients on a subgraph induced by nodes of one color, without 
having to worry about the effect of this on another subgraph 
induced by nodes of a different color (simply because there is 
no such effect). 



The basic idea of our proof is the following. We divide the 
set of graphs under case 3, into various classes, depending on 
the number of colors that exist in the graph. It turns out that as 
long as the number of colors in the graph is not 2, i.e., either 
0,1 or 3 and higher, then there is a simple argument which 
shows that each terminal can be satisfied. The argument in 
the case of two colors is a bit more involved and is developed 
separately. It can be shown that our counter-example in Section 
rvJis a case where there are two colors. Note however, that in 
our counter-example there are certain Si — tj pairs that have 
only one path between them. We now proceed to develop these 
arguments formally. 

Claim 4: Consider the subgraph induced by a certain color, 
w.l.o.g. (51,52,^1,^2) in G, denoted by Gi^si,s2,ti,t2)' There 
exists an assignment of encoding vectors over G(s^ S2,ti,t2)' 
such that any (unit entropy) function of the sources Xi and 
X2 can be multicasted to all nodes in G(5^ 52,ti,^2)- Moreover, 
such encoding vector assignments can be done independently 
over subgraphs of different colors. 

Proof: Note that we are working with directed acyclic 
graphs. Thus, there is a node in G(s^ S2,ti,t2)' ^^^^ ^^at it 
has no incoming edges in G(^si,s2,ti,t2)- Next, note that the path 
from 5i to V* has no intersection with a path from 52 or 53. To 
see this, suppose that there was such an intersection at node v\ 
If there is a path from 53 to v\ then is a (3, 2) node (which 
contradicts the assumption that v* is a (2, 2) node). If there 
is a path from 52 to v\ then v' and the remaining vertices 
connecting to on the path from 5i to have color 
(51, 52, ti, ^2). Contradicting the fact that has no incoming 
ed ges in G(5^ 52,ti,t2)- Likewise, we see that the path from 52 
to V* has no intersection with a path from 5i or 53. 

Therefore, the path from 5i to carries Xi in the clear, 
and likewise for the path from 52 to v*. Thus, can obtain 
both Xi and X2 and can compute any (unit entropy) function 
of them. Moreover, v* can transmit this function to all nodes 
of G(^si,s2,ti,t2) downstream of v*. As the argument above can 
be repeated for any node of in-degree in G(s^ 52,^1,^2) 
follows that all nodes of G(s^ S2,ti,t2) obtain the desired 
function of Xi and X2. 

Finally, we note that the assignments over subgraphs of 
different colors can be done independently, since there does 
not exist any edge between nodes of different colors (from 
Lemma O. ■ 

Lemma 4: Consider a graph G, with sources, Si^i = 
1, . . . , 3, and terminals tj^j = 1, ... 3, such that (a) it does 
not have any (3, 3), (2, 3) or (3, 2) nodes, and (b) there exists 
at least one Si — tj path for all i and j. Consider the set of all 
(2,2) nodes in G and their corresponding colors. If there exist 
no colors, exactly one color or at least three distinct colors in 
G, then there exists a set of coding vectors such that each 
terminal can recover Yl^=i hi- 
proof: Note that all leaves in G are of type (1, 2), (1, 3) 
or (2, 2). This implies that any terminal tj that does not have a 
(2, 2) leaf with source color including 5^, must have a leaf at 
which Xi is received in the clear. We refer to such leaves 
as singleton Xi leaves. The above follows directly by the 
connectivity assumption (b) stated in the Lemma. In cases 2 
and 3 in the analysis below, we need the field characteristic 
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(Sl, 52,^2, is) (S2, 33,^1,^3) 




Legend 



TABLE I 

Encoding on subgraphs of different source colors. Recovery 
of xi is possible from any two of the received values, 
using additions or subtractions. 



X1 + X2 
X3 



Fig. 4. A possible instance of Gaux when the degree sequence of the 
terminals is (2,2,2). The encoding specified in the legend denotes the 
encoding to be used on the appropriate subgraphs. 




Fig. 5. A possible instance of Gaux when the degree sequence of the 
terminals is (3, 2, 1). The encoding specified in the legend denotes the 
encoding to be used on the appropriate subgraphs. 



to be > 2. 

(0) Case 0. There are no colors in G. 

This impHes that there are no (2,2) nodes in G and thus 
all terminals tj have distinct leaves holding Xi, X2, and 
Xs respectively. It suffices to design a simple code on the 
paths from those leaves to tj which enables tj to recover 
the sum Xi +X2 +X3. 

(1) Case 1. There is only one color in G. 

In this case perform greedy encoding (cf . Definition [TJ 
on the r-edges. We show that each terminal can recover 
Xli^i from the content of its leaves. W.l.o.g, suppose 
that the color is (si, 52, ti, ^2). Using Claim[3l this means 
that both ti and ^2 have leaves of this color. The greedy 
encoding implies that ti and ^2 can obtain Xi + X2 
from the corresponding leaves. Moreover, both ti and 
t2 have a leaf containing a singleton X3, because of the 
connectivity requirements. Therefore, they can compute 
Xli^i The terminal ts has only singleton leaves, such 
that there exists at least one Xi , X2 and X3 leaf. Thus 
it can compute their sum. 
(ii) Case 2. There exist exactly three distinct colors in G. 
It is useful to introduce an auxiliary bipartite graph 
that denotes the existence of the colors at the leaves 
of the different terminals. This bipartite graph denoted 
Gaux is constructed as follows. There are three nodes 
t[^i = 1 , . . . , 3 that denote the terminals on one side and 
three nodes c • , z = 1 , . . . , 3 that denote the colors on the 
other side. If the color c- has tj in its support, then there 
is an edge between c- and tj, i.e., tj has a leaf of color 
The following properties of Gaux are immediate. 

- Each c- has degree-2. 

- Each t[ has degree at most 3. 

- Multiple edges between nodes are disallowed. 

Note that there are exactly three possible source colors 
((<5i, S2), (s2, S3) and (53, si)) and three possible terminal 
colors ((ti, ^2), (^2, ^3) and (^3,^1)). We now perform a 



Source color 


Encoding 


(Sl,52) 


2X1 + X2 


(S2, S3) 


X2 + 2X3 


(S1,S3) 


Xi — X3 



case analysis depending upon the degree sequence of 
nodes tj^j = 1 , . . . , 3 in Gaux • The degree sequence 
is specified by a 3 -tuple, where we note that the sum of 
the entries has to be 6. 

a) The degree sequence is a permutation of (0, 3, 3). 
This only happens if the terminal label of all colors, 
c • , i = 1 , . . . , 3 is the same and in turn implies that 
the source label of each color is distinct, i.e., the 
source colors include (si, S2), (s2, S3) and (^1,53). 
In this case, greedy encoding (cf. Definition [T]) works 
for the two terminals in the color support. This is 
because each terminal will obtain Xi +X2, X2 +X3 
and X1+X3 at its leaves (using Claims [3] and |4]) from 
which the terminal can compute 2Y^^=i Xi. The 
remaining terminal is not connected to any (2,2) leaf, 
which implies that all its leaves contain singleton 
values, from which it can compute Yl^^i ^i- 

b) The degree sequence is (2,2,2). 

This only happens if all the terminal labels of the col- 
ors are distinct, i.e., the terminal labels are (^1,^2), 
(^2,^3) and (^1,^3). Now consider the possibilities 
for the source labels. 

If there is only one source label, then greedy encod- 
ing ensures that the sum of exactly two of the sources 
reaches each terminal. The connectivity condition 
guarantees that the remaining source is available as 
a singleton at a leaf of each terminal. Therefore we 
are done. 

If there are exactly two distinct source colors, then 
we argue as follows. On the subgraphs induced by 
the colors with the same source label, perform greedy 
encoding. On the remaining subgraph, propagate the 
remaining useful source. We illustrate this with an 
example that is w.l.o.g. Suppose that the colors are 
(51,82,^1,^2), (81,82,^2,^3) and (82,83,^1,^3). We 
perform greedy encoding on the subgraphs of the first 
two colors, and only propagate X3 on the subgraph 
of the third color. As shown in Figure HI this means 
that terminals ti and ts are satisfied. Note that the 
connectivity condition dictates that t2 has to have a 
leaf that has a singleton X3, therefore it is satisfied 
as well. 

Finally, suppose that there are three distinct source 
colors. In this case we use the encoding specified in 
Table II on the subgraphs of each source color. It is 
clear on inspection that X]i=i ^an be recovered 
from any two of the received values (as from any two 
of the linear combinations stated, one can deduce the 
sum Xi + X2 + X3). 
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c) The degree sequence is a permutation of (1, 2, 3). 
In this case, the degree sequence dictates that there 
have to be two terminals that share two colors. This 
implies that the source label of those colors has to 
be different. For the subgraphs induced by these 
colors, we use the encoding proposed in Table U 
For the subgraph induced by the remaining color, 
we perform greedy encoding. For example, suppose 
that the colors are (51,52,^1,^2), (52,^3,^1,^2) and 
(52,53,^1,^3). As shown in Figure [3 ti and ^2 are 
clearly satisfied (even without using the information 
from color (52,53,^1,^3)). Terminal ts has to have 
a singleton leaf containing Xi by the connectivity 
condition and is therefore satisfied. 
Together, these arguments establish that in the case when 
there are three colors, all terminals can be satisfied, 
(ii) Case 3. There exist more than three distinct colors in G. 
Note that if there are at least four colors in G, then (a) 
there are two colors with the same terminal label, since 
there are exactly three possible terminal labels, and (b) for 
the colors with the same terminal labels, the source labels 
necessarily have to be different. Our strategy is as follows. 
For the terminals that share two colors, use the encoding 
proposed in Table H If the remaining terminal has access 
to only one source color, then use greedy encoding and 
note that this terminal has to have a singleton leaf. If it 
has access to at least two source colors, simply use the 
encoding in Table |I] for it as well. 

■ 

It remains to develop the argument in the case when there are 
exactly two distinct colors in G. For this we need to explicitly 
use the fact that there are two edge-disjoint paths between 
each Si — tj pair. 

Lemma 5: Consider a graph G, with sources, Si^i = 
1, . . . , 3, and terminals tj^j = 1, ... 3, such that (a) it does 
not have any (3, 3), (2, 3) or (3, 2) nodes, and (b) there exist 
at least two Si — tj paths for all i and j. Consider the set of 
all (2,2) nodes in G and their corresponding colors. If there 
exist exactly two distinct colors in G, then there exists a set of 
coding vectors such that each terminal can recover Yl^^i hi- 
proof: As in the proof of Lemma (H we argue based on 
the content of the leaves of the terminals. Suppose that the 
auxiliary bipartite graph Gaux is formed. If both the colors 
have the same terminal label (see Figure [6] for an example), 
then it is clear that the encoding in Table J] on the subgraphs 
induced by the colors suffices for the corresponding terminals. 
The third terminal has singleton leaves corresponding to each 
source and can compute Xli=i ^i- 

Another possibility is that the terminal labels of the colors 
are different, but the source labels are the same. It should be 
clear that this case can be handled by greedy encoding on the 
colors. 

The situation is more complicated when the terminal and 
source labels of the colors are different, see for example Figure 
[71 In the case depicted, greedy encoding does not work since 
it satisfies ti and ts but not ^2. W.l.o.g., we assume that the 
colors are (51,52,^1,^2) and (52,53,^2,^3). Now, we know 
that there exist two vertex-disjoint paths between 5i (a similar 




ti ^2 ^3 

Fig. 6. An instance of Gaux when there exist exactly two distinct colors 
under case 3, such that the terminal labels of the colors are the same. 



(si,S2,ii,i2) (82,53,^2,^3) 




Fig. 7. An instance of Gaux when there exist exactly two distinct colors 
under case 3, such that both the source labels and the terminal labels of the 
colors are different. 

argument can be made for 53) and ^2. Each of these paths has 
a leaf for t2. If one of the leaves contains a singleton Xi (i.e, 
receives Xi in the clear), then performing greedy encoding 
on the two colors works since t2 obtains Xi + X2, Xi and 
X2 + Xs and the other terminals will obtain singleton leaves 
that satisfy their demand. Likewise, if there is a singleton leaf 
containing X3 on the vertex disjoint paths from 53 to ^2, then 
greedy encoding works. 

Thus, the corresponding leaves of ^2 must be of type (2, 2). 
This implies that there are at least four distinct leaves of ^2 
of type (2,2), two of color (51,52,^1,^2) and two of color 
(52,53,^2,^3). We now conclude our proof by the following 
claims. 

Consider the subgraph induced by nodes colored by one of 
the colors above, w.l.o.g. (51,52,^1,^2), in G together with 
the (1, •) nodes connected to either 5i or 52 in G. Denote this 
subgraph by G' . Consider a random linear network code on 
the nodes of G' (namely, each node outputs a random linear 
combination of its incoming information over the underlying 
finite field of size 2^). We show, with high probability (given 
m is large enough), that such a code allows both ti and ^2 to 
receive two linearly independent combinations of Xi and X2 
at their leaves. An analogous argument also holds for t2 and ts 
when considering the color (52, 53, ^2,^3) and the information 
X2 and X3. This suffices to conclude our assertion. 

Claim 5: Let u be any leaf in G\ Let U = aXi + /3X2 

be the incoming information of u. With probability (1 — 
2-m+ij|y| ^^^^ ^ p 

Proof: Denote by C = {ci} the collection of coefficients 
used in the random linear network code on G\ Namely, each 
Ci is uniformly distributed in GF(T^), and the information 
on each edge e is a linear combination of it's incoming 
information using coefficients from G (each coefficient in G 
is used only once). 

It is not hard to verify that a is a multivariate polynomial 
in the variables in G of total degree ^, where ^ is the length 
of the longest path between Si and u (here z = 1, 2). Namely, 
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£ < n = \V\. Moreover, each variable Ci in a is of degree 
at most 1. As is a (2, 2) leaf and is connected to si, there 
is a setting for the variables in C such that a ^ (consider 
for example setting the values of variables in C to match the 
greedy encoding function discussed previously). Thus, a is not 
the zero polynomial. We conclude, using Lemma 4 of llT4l . that 
a obtains that value with probability at most 1 — (1 — 2~^)^ 
(over the choice of the values of variables in C). (We note that 
Lemma 4 of |[T4l is a slightly refined version of the Schwartz- 
Zippel lemma.) The same analysis holds for (3. Finally, to 
study the probability that either a or (3 are zero we study the 
polynomial a • /3, of total degree 2£, where each variable Ci in 
a • /3 is of degree at most 2. Our assertion now follows from 
Lemma 4 of ^M- ■ 

Consider the terminal t2 and its two edge disjoint paths from 
si denoted Pi and P2. Let ui and U2 be the corresponding 
leaves on paths Pi and P2. As the leaves of t2 are of type (2, 2) 
and as both ui and U2 are connected to 5 1 it holds that both 
ui and U2 are of color (si, 52, ti, ^2) and in G\ The following 
claim shows that with high probability (given m large enough) 
t2 will receive two linearly independent combinations of Xi 
and X2 3.1 ui and U2. 

Claim 6: Let Ui = aiXi -\- /3iX2 be the incoming in- 
formation of ui, and U2 = 0^2X1 + ^2X2 the incoming 
information of U2. With probability (1 — 2~^+^)^ the vectors 
{{oLi^ Pi)}i^i^2 are independent. 

Proof: Our proof follows the line of proof given in 
Claim [51 Namely, let C = {ci} be the collection of coefficients 
used in the random linear network code on G' . As before, ai, 
0^2, /^i and 132 are multivariate polynomials in the variables in 
C. To study the independence between Ui and /72 we study 
the determinant F of the 2 x 2 matrix with rows (ai, and 
(<^2, /^2). The determinant F is of total degree 2£, where each 
variable Ci in F is of degree at most 2. So to conclude our 
assertion (via Lemma 4 of lfT4l ) it suffices to prove that F is 
not the zero polynomial. 

To this end, we present an encoding function (a setting of 
assignments for the variables in C) for which F will be 1. 
Consider the two disjoint paths connecting si and terminal 
t2 (denoted as Pi and P2). Recall that ui and U2 are the 
corresponding leaves of color (51,52,^1,^2), where Ui G Pi. 
Let V be the vertex closest to si on these paths that is 
connected to S2 (ties broken arbitrarily), assume w.l.o.g. that 
V e P2- Let P3 be the path connecting S2 and v. Consider 
the subgraph H of G' consisting of the paths Pi, P2 and P3. 
Using the edges of H alone, one can design a routing scheme 
such that ui will receive the information Xi in the clear and 
U2 the information X2. This will imply that (cei, /3i) = (1,0), 
(0^2, /32) = (O7 1), and F = 1. Indeed, just forward Xi on Pi 
and forward X2 on P3 until it reaches v and then from v to 
U2 on P2. ■ 

Now, consider the terminal ti and its two edge disjoint 
paths from si denoted Pi and P2. Let ui and U2 be the 
corresponding leaves on paths Pi and P2 (to simplify notation 
we use the same notation as previously used for ^2). Here, we 
consider two cases, if both ui and U2 are (2, 2) nodes, then 
by Claim [6] we are done (with high probability). Namely, with 
high probability (given m large enough) ti will receive two 
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Fig. 8. Example of a network with two sources and two terminals, such that 
there exist two edge-disjoint paths between each source and each terminal. 
Source node Si (S2) observes a source of entropy 2, [a a'] ([b b']). The 
terminals seek to reconstruct [a + 6 a' -\-b']. However, this is impossible with 
linear codes. 



linearly independent combinations of Xi and X2 at ui and 
U2. Otherwise, ti has at least one leaf with Xi in the clear. 
Denote this leaf as vi. Notice that ti must have at least a 
single (2, 2) leaf (by Claim [3]) , denote this leaf by V2. Finally, 
by Claim \5\ it holds that with high probability the information 
present at vi and at V2 is independent. 

To conclude, notice that the discussion above (when applied 
symmetrically for t2, ts, and the color (s2, S3, ^2, ^3)) implies 
that all terminals are able to obtain the desired sum Xi + 
X2 + Xs (by an appropriate setting of the encoding functions 
on their (-,1) edges). 



IX. Discussion and Future Work 

In this work, we have introduced the problem of multicas- 
ting the sum of sources over a network. This is a first step 
towards extending network coding to the general problem of 
communication functions over networks. 

We have shown that in networks with unit capacity edges, 
and unit-entropy sources, with at most two sources or two 
terminals, the sum can be recovered at the terminals, as long 
as there exists a path between each source-terminal pair. 
Furthermore, we demonstrate that this characterization does 
not hold for three sources (3s)/three terminal (3t) networks. 
For the 3s /3t case we show that if each source terminal pair 
is connected by at least two edge disjoint paths, sum recovery 
is possible at the terminals. In each of these cases we present 
efficient network code assignment algorithms. 

Since the initial publication of the results in Sections |lVl 
and |V] in ifTSll . the result for the n sources, two terminals 
case has been proved in an alternate manner in ifTTl by 
considering the reversed network (see II2TII . ll22t for related 
work). The 3-sources, 3-terminals example in Section IVll was 
found independently by Rai et al. 123]. However, their proof 
only rules out linear network codes. 

Several questions remain open, that we discuss below. 

• Is the two edge-disjoint path condition (between Si/tj 
pairs) necessary or can other combinatorial connectivity 
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requirements characterize the capacity of the network 
arithmetic problem for the 3s/3t case. The results of this 
paper indicate that this condition is sufficient. However, 
it is not evident whether this is necessary. 

• It is unclear whether our techniques extend in the case of 
higher number of sources and terminals. At present, the 
case of l^*! > 3 and |T| > 3 is completely open. 

• Our work can be easily extended to multicasting any 
linear function of the sources, XlILi ^i^i (^ach source 
node simply creates a new sources X- = a^X^). It would 
be interesting to extend our work to more general function 
families. 

• In our problem formulation, we have considered unit- 
entropy sources over unit-capacity networks. However, in 
general, one could consider sources of arbitrary entropies, 
by considering vector- sources (as considered in |[T8l ). and 
requiring the terminals to recover a vector that contains 
component- wise function evaluations. This version of 
the problem is also open for the most part. In fact, in 
this case even our characterization for l^*! = 2 does 
not hold. For example, consider the two-sources, two- 
terminals network shown in Figure [Sj where each edge is 
of unit-entropy. Each source node observes a source of 
entropy two, that is denoted by a vector of length two. 
The terminals need to recover the vector sum. 

In this network there are two sources, and based on our 
result in Section [iVl it is natural to conjecture that if max- 
flow(s^ — tj) = 2, holds for i^j = 1,2, then a network 
coding assignment exists. The network in Figure [8] has 
this connectivity requirement. However, as shown in the 
Appendix, with linear codes recovering the vector sum at 
both the terminals is not possible. 

• We have exclusively considered the case of directed 
acyclic networks. An interesting direction to pursue 
would be to examine whether these characterizations hold 
in the case of networks where directed cycles are allowed. 

• Our work has been in the context of zero-error recovery 
of the sum of the sources. It would be interesting to 
examine whether the conclusion changes significantly if 
one allows for recovery with some (small) probability of 
error. 

X. Appendix 

A. Proof of Lemma\2\ 

We proceed by induction on the number of sources. 

1) Base case n = 1: In this case there is only one source 
s[ and both the terminals need to recover Xi. Note that we 
are given the existence of path{s[ — t[) and path{s[ — t'2) 
in G' . In general, these paths can intersect at multiple nodes 
which may imply that there exist multiple paths (for example) 
from s'l to t'^. Now, from path{s[ — 1[) and path{s[ — 1'2) we 
can find the last node where these two paths meet. Let this last 
node be denoted ui. Then as shown in Figure [9] we can find a 
new set of paths from s'^ to t'^ and s'^ to that overlap from 
s'^ to ui and have no overlap thereafter. Choose G* to be the 
union of these new set of paths. It is easy to see that in G* 
there is exactly one path from s'l to t'^ and exactly one path 



from s'l to t'2. Moreover removing any edge from G* would 
cause at least one path to not exist. 




t'l t'2 t'l t'2 



Fig. 9. The figure on the left shows path{s[ — t[) (in solid lines) and 
path(s[ — t'2) (in broken lines). The figure on the right shows that one can 
find a new set of paths from s'-^ to t'^ and such that they share edges from 
s'^ to ui and have no intersection thereafter. 

2) Induction Step: We now assume the induction hypoth- 
esis for n — 1 sources, i.e., there exists a minimal subgraph 
G*_i of G' such that there is exactly one path from 5- to t'^ 
fox i = 1, . . . , n — 1 and j = 1,2. Using this hypothesis we 
shall show the result in the case when there are n sources. 

As a first step color the edges in the subgraph G*_i, blue 
(the remaining edges in G' have no color). The conditions on 
G' guarantee the existence of path{s'^ — t'i) and path{s'^ — t'2). 

Color all edges on path{s'^ — t'l) and path{s'^ — t'2) red. 
This would imply that some edges have a pair of colors. Now, 
consider the subgraph induced by the union of the blue and 
red subgraphs that we denote Gbr- 

Find the first node at which path{s'^ — t'l) intersects the 
blue subgraph and call that node ui. Similarly, find the first 
node at which path{s'^ — 1'2) intersects the blue subgraph and 
call that node U2. 

Observe that in G*_i there has to exist a path{s'^ — t'j) for 
some i = 1, . . . , n — 1 and j = 1,2 that passes through ui. 
To see this assume otherwise. This implies that ui does not 
lie on any path connecting one of the sources to one of the 
terminals. Therefore, the incoming and the outgoing edges of 
ui can be removed without violating the max-flow conditions 
stated in Theorem O This contradicts the minimality of G*_i. 
Therefore we are guaranteed that there exists at least one 
source such that there exists an exclusively blue path from 
it to ui in G*_i. A similar statement holds for the node U2. 
We now establish the statement of the lemma when there are 
n sources. 

Case 1: In G^^ there exists a path from ui io t'2 such that 
all edges on this path have a blue component. 
First, we remove the color red from all edges on path{s'^ — 
t'2)\path{s'^ — t'i)\ in particular, remove any exclusively red 
edge belonging to this set from G^^r- We assume that the 
edges in path{s'^ — ui) are essential for ensuring that s'^ and 
ui are connected. If this is not true, path{s'^ — ui) can be 
preprocessed to make this true. 

Next, form a subset of the sources denoted S^'^^'^ in the 
following manner. For each source s • , z = 1 , . . . , n do the 
following. 
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i) If there exists a path (with edges of color red or blue) 
from 5- to ui, add it to set S'^'^^^ 0. 

Let G*^^^^ denote the subgraph induced by |J^/ ^5(^1) path{s[ — 

Ui). 

Consider the graph obtained by removing the subgraph 
Qiui) fj^om Gbr- We denote this graph G^^. We claim that 
the max-flow conditions continue to hold over G^^ for the set 
of sources S\S'''^^\ Furthermore, there still exist path{ui—t[) 
and path{ui — t'2) in G^. 

To see this note that the max-flow conditions for a source 
s[ e S\S^'^^'^ can be violated only if an edge e belonging to 
a path from to t^^j = 1,2 is removed. This happens only 
if there exists a path from e to ui which contradicts the fact 
that s[ e S\S^'^^\ Next, there still exist paths from ui to the 
terminals since the edges on these paths are downstream of 
ui. If any of these was removed by the procedure, this would 
contradict the acyclicity of the graph. 

Note that the subgraph G*^^^^ contains a set of sources S^^^"^ 
and a single node ui such that there exists exactly one path 
from each source in S^'^^'^ to ui. This has to be true for 
the sources in otherwise the minimality of G*_i 

would be contradicted. It is true for Sn by construction, since 
we removed the red color from path{sn — ^2). 

Next, introduce an artificial source Sa and an edge Sa ^ ui 
in G^^. Note that | < n - 2, which means that the 

total number of sources in G^^ (including Sa) is at most n — 1. 
Therefore, the induction hypothesis can be applied on G^^, 
i.e., there exists a minimal subgraph of G^ such that there 
exists exactly one path from (5'\5'*^^^^)U{5a} to each terminal. 
Suppose that we find this subgraph. Now, remove Sa and the 
edge Sa ui from this subgraph and augment it with the 
subgraph G*^^^^ found earlier. We claim that the resulting graph 
is minimal and has the property that there exists exactly one 
path from each source to each terminal. 

To see this note that there exists only one path from a 
source G to t^ , j = 1, 2. This is because even after 

the introduction of G*^^^^ there does not exist a path from 
5- to ui in this graph. Therefore, the introduction of G*^^^^ 
cannot introduce additional paths between s[ G S\S'''^^^ and 
the terminals. Next we argue for a source s' e Note 
that there exists exactly one path from ui to both the terminals 
so the condition can be violated only if there exist multiple 
paths from s[ G 5*^^^^ to ui, but the construction of G*^^^^ 
rules this out. For the minimality claim, note that we used 
the induction hypothesis on G^^ augmented with Sa and edge 
Sa ^ ui. Thus, each edge in the resulting graph is essential 
to connectivity between Si — tj pairs where Si G S\S^'^^\ The 
only possibility is that there is an edge in G*^^^^ that is not 
essential to connectivity. However, this is not possible for the 
blue edges since they are essential to connectivity (by virtue of 
the fact that they also belong to G* Furthermore, removing 
any red edge from G*^^^^ would also cause s'^ and ui to be 
disconnected, because of the preprocessing on path{s'^ — ui) 
performed above. 

path from s'^ to ui cannot have a (red,blue) edge since is the first 
node where a red path intersects the blue subgraph 



Case 2: In G^r there exists a path from U2 to t[ such that 
all edges on this path have a blue component. 
This case can be handled in exactly the same manner as in 
case 1 by removing the color red from all edges on path{s'^ — 
t'i)\path{s'^ — t'2) and applying similar arguments for U2- 

Case 3: In G^r there (a) does not exist a path with blue 
edges from ui to and (b) does not exist a path with blue 
edges from U2 to t'^. 

As shown previously ui lies on some path from 5- to t'j for 
some i and j in G*_i. In the current case there does not exist 
a blue path from i^i to t'2. Therefore there has to exist a blue 
path from ui to t'l in G*_i. A similar argument shows that 
there has to exist a blue path from U2 to t'2 in G*_x. 

Now, we preprocess the red paths, path{s'^ — ui) and 
path{s'^ — U2) so that (a) they share edges until their last 
intersection point, i.e., the node common to both paths that is 
furthest away from s^, and (b) all red edges in path{s'^ — ui)[J 
path{s'^ — U2) are essential to the connectivity of s'^ to ui and 
U2- Assume for the discussion below that such preprocessing 
has been performed to obtain the new paths. 

Now, choose the desired subgraph to be the union of G*_i 
and the red paths, path{s'^ — ui) and path{s'^ — U2) i.e. G* = 
G*_i Upath{s'^ — ui) Upath{s'^ — U2). It is evident that G* 
is minimal. By the induction hypothesis there exists exactly 
one path between 5-,i = l,...,n — 1 and t'^^j = 1, 2. This 
continues to be true in G*, since the red edges cannot be 
reached from the blue edges. To see that there is exactly one 
path from s'^ to t'^ , assume otherwise and observe that there is 
exactly one path from s'^to ui by the construction of the red 
paths. Thus the only way there can be multiple paths from s'^ 
to t'l is if there are multiple paths from i^i to t'^, but this would 
contradict the induction hypothesis since this would imply that 
there exists some 5-,i = l,...,n — 1 that has multiple paths 
to t'l. A similar argument shows that there exists exactly one 
path from s'^ to t'2. ■ 

B. Discussion about network in Figure [H 

We prove that under linear network coding, recovering [a + 
h a'^h'] at Ti and T2 is impossible. Let Ai and A2 be matrices 

. Without loss 

of generality, we can express the" received vectors at terminals 
Ti and T2 as 
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Using simple computations it is not hard to see that for both 
the terminals to be able to recover [a + 6 a' + h'Y we need 
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require all these matrices to be full-rank. Note that 
the full-rank condition requires that all the coefficients 
Qfi, 0^2, /32, /3[ and be non-zero and the matrices 
Ai and Bi to be full-rank. In particular, the required condition 
is equivalent to requiring that 
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For the above equality to hold, we definitely need /32 = 0, 
but this would contradict the requirement that (32 that is 
needed for the full rank condition. ■ 
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